Bangui
- North America > United States > Michigan (0.04)
- North America > Puerto Rico (0.04)
- North America > Mexico > Colima (0.04)
- (2 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Health Care Technology (0.68)
Template-Based Text-to-Image Alignment for Language Accessibility: A Study on Visualizing Text Simplifications
Souayed, Belkiss, Ebling, Sarah, Gao, Yingqiang
Individuals with intellectual disabilities often have difficulties in comprehending complex texts. While many text-to-image models prioritize aesthetics over accessibility, it is not clear how visual illustrations relate to text simplifications (TS) generated from them. This paper presents a structured vision-language model (VLM) prompting framework for generating accessible images from simplified texts. We designed five prompt templates, i.e., Basic Object Focus, Contextual Scene, Educational Layout, Multi-Level Detail, and Grid Layout, each following distinct spatial arrangements while adhering to accessibility constraints such as object count limits, spatial separation, and content restrictions. Using 400 sentence-level simplifications from four established TS datasets (OneStopEnglish, SimPA, Wikipedia, and ASSET), we conducted a two-phase evaluation: Phase 1 assessed prompt template effectiveness with CLIPScores, and Phase 2 involved human annotation of generated images across ten visual styles by four accessibility experts. Results show that the Basic Object Focus prompt template achieved the highest semantic alignment, indicating that visual minimalism enhances language accessibility. Expert evaluation further identified Retro style as the most accessible and Wikipedia as the most effective data source. Inter-annotator agreement varied across dimensions, with Text Simplicity showing strong reliability and Image Quality proving more subjective. Overall, our framework offers practical guidelines for accessible content generation and underscores the importance of structured prompting in AI-generated visual accessibility tools.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- (3 more...)
- North America > United States > Michigan (0.04)
- North America > Puerto Rico (0.04)
- North America > Mexico > Colima (0.04)
- (2 more...)
- Media > Film (1.00)
- Leisure & Entertainment (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine > Health Care Technology (0.68)
MidPO: Dual Preference Optimization for Safety and Helpfulness in Large Language Models via a Mixture of Experts Framework
Qi, Yupeng, Lyu, Ziyu, Yang, Min, Wang, Yanlin, Bai, Lu, Cui, Lixin
As large language models (LLMs) are increasingly applied across various domains, enhancing safety while maintaining the helpfulness of LLMs has become a critical challenge. Recent studies solve this problem through safety-constrained online preference optimization or safety-constrained offline preference optimization. However, the safety-constrained online methods often suffer from excessive safety, which might reduce helpfulness, while the safety-constrained offline methods perform poorly in adaptively balancing safety and helpfulness. To address these limitations, we propose MidPO, a \textbf{\underline{Mi}}xture of Experts (MoE) framework for safety-helpfulness \textbf{\underline{d}}ual \textbf{\underline{P}}reference \textbf{\underline{O}}ptimization. Firstly, MidPO devises single-preference enhanced direct preference optimization approach to transform the base model into two independent experts, termed safety and helpfulness experts, and fine-tunes the two independent experts for optimal safety or helpfulness performance. Secondly, to achieve an effective balance between safety and helpfulness, MidPO incorporates the two experts into the MoE framework and designs a dynamic routing mechanism to allocate contributions from each expert adaptively. We conduct quantitative and qualitative experiments on three popular datasets to demonstrate the proposed MidPO significantly outperforms state-of-the-art approaches in both safety and helpfulness. The code and models will be released.
- North America > United States (0.28)
- Africa > Central African Republic > Bangui > Bangui (0.05)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Law (1.00)
- Health & Medicine > Therapeutic Area (0.93)
- Information Technology (0.68)
Efficient $k$-NN Search in IoT Data: Overlap Optimization in Tree-Based Indexing Structures
Benrazek, Ala-Eddine, Kouahla, Zineddine, Farou, Brahim, Seridi, Hamid, Kemouguette, Ibtissem
The proliferation of interconnected devices in the Internet of Things (IoT) has led to an exponential increase in data, commonly known as Big IoT Data. Efficient retrieval of this heterogeneous data demands a robust indexing mechanism for effective organization. However, a significant challenge remains: the overlap in data space partitions during index construction. This overlap increases node access during search and retrieval, resulting in higher resource consumption, performance bottlenecks, and impedes system scalability. To address this issue, we propose three innovative heuristics designed to quantify and strategically reduce data space partition overlap. The volume-based method (VBM) offers a detailed assessment by calculating the intersection volume between partitions, providing deeper insights into spatial relationships. The distance-based method (DBM) enhances efficiency by using the distance between partition centers and radii to evaluate overlap, offering a streamlined yet accurate approach. Finally, the object-based method (OBM) provides a practical solution by counting objects across multiple partitions, delivering an intuitive understanding of data space dynamics. Experimental results demonstrate the effectiveness of these methods in reducing search time, underscoring their potential to improve data space partitioning and enhance overall system performance.
- Africa > Middle East > Algeria > Guelma Province > Guelma (0.05)
- Africa > Middle East > Algeria > Djelfa Province > Djelfa (0.04)
- Europe > Germany > Hamburg (0.04)
- (2 more...)
"Image, Tell me your story!" Predicting the original meta-context of visual misinformation
Tonglet, Jonathan, Moens, Marie-Francine, Gurevych, Iryna
To assist human fact-checkers, researchers have developed automated approaches for visual misinformation detection. These methods assign veracity scores by identifying inconsistencies between the image and its caption, or by detecting forgeries in the image. However, they neglect a crucial point of the human fact-checking process: identifying the original meta-context of the image. By explaining what is actually true about the image, fact-checkers can better detect misinformation, focus their efforts on check-worthy visual content, engage in counter-messaging before misinformation spreads widely, and make their explanation more convincing. Here, we fill this gap by introducing the task of automated image contextualization. We create 5Pils, a dataset of 1,676 fact-checked images with question-answer pairs about their original meta-context. Annotations are based on the 5 Pillars fact-checking framework. We implement a first baseline that grounds the image in its original meta-context using the content of the image and textual evidence retrieved from the open web. Our experiments show promising results while highlighting several open challenges in retrieval and reasoning. We make our code and data publicly available.
- Africa > Ethiopia (0.14)
- North America > United States > California (0.14)
- Europe > Ukraine (0.14)
- (38 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
A Survey of Time Series Foundation Models: Generalizing Time Series Representation with Large Language Model
Ye, Jiexia, Zhang, Weiqi, Yi, Ke, Yu, Yongzi, Li, Ziyue, Li, Jia, Tsung, Fugee
Time series data are ubiquitous across various domains, making time series analysis critically important. Traditional time series models are task-specific, featuring singular functionality and limited generalization capacity. Recently, large language foundation models have unveiled their remarkable capabilities for cross-task transferability, zero-shot/few-shot learning, and decision-making explainability. This success has sparked interest in the exploration of foundation models to solve multiple time series challenges simultaneously. There are two main research lines, namely pre-training foundation models from scratch for time series and adapting large language foundation models for time series. They both contribute to the development of a unified model that is highly generalizable, versatile, and comprehensible for time series analysis. This survey offers a 3E analytical framework for comprehensive examination of related research. Specifically, we examine existing works from three dimensions, namely Effectiveness, Efficiency and Explainability. In each dimension, we focus on discussing how related works devise tailored solution by considering unique challenges in the realm of time series. Furthermore, we provide a domain taxonomy to help followers keep up with the domain-specific advancements. In addition, we introduce extensive resources to facilitate the field's development, including datasets, open-source, time series libraries. A GitHub repository is also maintained for resource updates (https://github.com/start2020/Awesome-TimeSeries-LLM-FM).
- North America > United States > New York > New York County > New York City (0.14)
- Asia > China > Hong Kong (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (16 more...)
- Overview (1.00)
- Research Report > New Finding (0.67)
- Research Report > Experimental Study (0.46)
Assessing the Impact of a Supervised Classification Filter on Flow-based Hybrid Network Anomaly Detection
Macko, Dominik, Goldschmidt, Patrik, Pištek, Peter, Chudá, Daniela
Constant evolution and the emergence of new cyberattacks require the development of advanced techniques for defense. This paper aims to measure the impact of a supervised filter (classifier) in network anomaly detection. We perform our experiments by employing a hybrid anomaly detection approach in network flow data. For this purpose, we extended a state-of-the-art autoencoder-based anomaly detection method by prepending a binary classifier acting as a prefilter for the anomaly detector. The method was evaluated on the publicly available real-world dataset UGR'16. Our empirical results indicate that the hybrid approach does offer a higher detection rate of known attacks than a standalone anomaly detector while still retaining the ability to detect zero-day attacks. Employing a supervised binary prefilter has increased the AUC metric by over 11%, detecting 30% more attacks while keeping the number of false positives approximately the same.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- Europe > Austria > Vienna (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- (5 more...)
- Overview (0.93)
- Research Report > New Finding (0.46)
Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data
Naggita, Keziah, LaChance, Julienne, Xiang, Alice
Biases in large-scale image datasets are known to influence the performance of computer vision models as a function of geographic context. To investigate the limitations of standard Internet data collection methods in low- and middle-income countries, we analyze human-centric image geo-diversity on a massive scale using geotagged Flickr images associated with each nation in Africa. We report the quantity and content of available data with comparisons to population-matched nations in Europe as well as the distribution of data according to fine-grained intra-national wealth estimates. Temporal analyses are performed at two-year intervals to expose emerging data trends. Furthermore, we present findings for an ``othering'' phenomenon as evidenced by a substantial number of images from Africa being taken by non-local photographers. The results of our study suggest that further work is required to capture image data representative of African people and their environments and, ultimately, to improve the applicability of computer vision models in a global context.
- Asia > Brunei (0.14)
- North America > Canada > Quebec > Montreal (0.06)
- Africa > Sierra Leone (0.06)
- (142 more...)
- Health & Medicine (0.92)
- Information Technology > Services (0.75)
- Government > Regional Government (0.46)
- Information Technology > Communications > Social Media (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
Recovering Private Text in Federated Learning of Language Models
Gupta, Samyak, Huang, Yangsibo, Zhong, Zexuan, Gao, Tianyu, Li, Kai, Chen, Danqi
Federated learning allows distributed users to collaboratively train a model while keeping each user's data private. Recently, a growing body of work has demonstrated that an eavesdropping attacker can effectively recover image data from gradients transmitted during federated learning. However, little progress has been made in recovering text data. In this paper, we present a novel attack method FILM for federated learning of language models (LMs). For the first time, we show the feasibility of recovering text from large batch sizes of up to 128 sentences. Unlike image-recovery methods that are optimized to match gradients, we take a distinct approach that first identifies a set of words from gradients and then directly reconstructs sentences based on beam search and a prior-based reordering strategy. We conduct the FILM attack on several large-scale datasets and show that it can successfully reconstruct single sentences with high fidelity for large batch sizes and even multiple sentences if applied iteratively. We evaluate three defense methods: gradient pruning, DPSGD, and a simple approach to freeze word embeddings that we propose. We show that both gradient pruning and DPSGD lead to a significant drop in utility. However, if we fine-tune a public pre-trained LM on private text without updating word embeddings, it can effectively defend the attack with minimal data utility loss. Together, we hope that our results can encourage the community to rethink the privacy concerns of LM training and its standard practices in the future.
- North America > Puerto Rico (0.04)
- North America > Mexico > Colima (0.04)
- North America > United States > Michigan (0.04)
- (5 more...)
- Media > Film (1.00)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Leisure & Entertainment > Games > Computer Games (0.46)